home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
tsql
/
doc
/
tsql.mail
/
000100_nls@cse.iitb.ernet.in _Sat May 1 21:48:00 1993.msg
< prev
next >
Wrap
Internet Message Format
|
1996-01-31
|
6KB
Received: from relay2.UU.NET by optima.CS.Arizona.EDU (5.65c/15) via SMTP
id AA21109; Sat, 1 May 1993 21:48:00 MST
Received: from spool.uu.net (via LOCALHOST.UU.NET) by relay2.UU.NET with SMTP
(5.61/UUNET-internet-primary) id AA23076; Sun, 2 May 93 00:48:09 -0400
Received: from sangam.UUCP by spool.uu.net with UUCP/RMAIL
(queueing-rmail) id 004726.20115; Sun, 2 May 1993 00:47:26 EDT
Received: by sangam.ncst.ernet.in (4.1/SMI-4.1-MHS-7.0)
id AA11843; Sun, 2 May 93 10:08:15+0530
Received: from kailash.cse.iitb.ernet.in by iitb.ernet.in
SENDMAIL Version (4.1/SMI-4.1-MHS-7.0)
id AA22512; Sun, 2 May 93 10:00:42+0530
Received: by kailash.cse.iitb.ernet.in (4.1/SMI-4.1)
id AA03479; Sun, 2 May 93 10:02:07 IST
Date: Sun, 2 May 93 10:02:07 IST
From: nls@cse.iitb.ernet.in (N L Sarda)
Message-Id: <9305020432.AA03479@kailash.cse.iitb.ernet.in>
To: tsql@cs.arizona.edu
BENCHMARK QUERIES : ON THEIR CLASSIFICATION
In our efforts to define classes of queries to be used as benchmarks,
I felt that the classification can also be worked out from user's
point of view (in addition to the classification done by CSJensen
based on SQL format) so that certain types, expected to be more
frequent than others, can be emphasized. The following is an
exploration in this direction.
Our target database contains historical data of one/more sets of
entities. An entity has many facts stored about it in the database.
A fact is true over some valid-time interval. There is one 'current'
fact and the others are history (past facts). A real world entity
may be 'in and out' of our database at various times (eg., an
employee being fired and re-hired). Thus, it exists sometimes and
does not at other.
Our retrieval may obtain data from one entity set or from multiple
entity sets. A particular case of interest in multiple entity set
query is when the entities existed concurrently.
Our retrieval might ask for full facts stored in database, or parts
of facts with coalesceing and/or time-slicing. Latter limits time values
in result to the time-slice boundaries.
The focus of retrieval may be only the current data, only the past
data, or the historical data (current + past). On the other hand, user
may want to obtain aggregated results.
The retrieval may be constrained by a predicate on non-temporal as
well as temporal attributes. We may focus only on the latter for
defining taxonomy for our benchmark queries.
A large number of time domain operators (or, functions) can be
identified for
constructing temporal predicates. Various query languages may differ
with respect to what operators are included by them. However, since
languages can be easily enriched with more of these operators/functions,
it may not be necessary to define query classes based on these
operators.
We now come to formally stating our way of query classes. We use
COBOL type notation for this : square brackets give an option
and braces define a choice. Words outside brackets/braces are
for readability. (The ugly representation of large braces may please
be pardoned.)
------------------------------------------------------------
A query class is
[CONCURRENT] [TIME-SLICED] [COALESCED]
| CURRENT |
| PAST |
{ HISTORICAL }
| AGGREGATED |
| |
retrieval
[
based on
| EXISTANCE |
{ } of
| NON-EXISTANCE |
[AGGREGATED] relationship
| IN / WITH |
{ }
| AT ALL |
| INSTANT |
{ INTERVAL }
| ELEMENT |
| DURATION |
specified by
| user |
{ }
| computed from |
| other data |
]
----------------------------------------------------------
Let us mention below some example classes obtained from above
categorization of query classes :
1. current retrieval (based on non-temporal predicate)
2. historical retrieval based on existance relationship in
(an) interval specified by user.
3. time-sliced coalesced historical retrieval based on existance
relationship at all (instants in the) interval given (a) by
user, and (b) computed (possibly using another retrieval).
4. concurrent historical retrieval based on existance relationship
with an interval.
5. past retrieval based on non-existance relationship at all
(instants in an) interval given by user.
The above definition also leads to a fairly large number of query
classes. It may be desirable to cut down the number by not
considering every possible combination.
The classification above can be easily related with the taxonomy
proposed by Jensen as follows :
a) concurrent multi-entity retrieval cooresponds to imposed interval
valid-time component in output and containment-based operator on
intervals of participating entities (ie., relations) in selection
based taxonomy. There are many other ways of relating entities
other than their concurrent existance, but concurrent will be
more commonly required. The other categories need to be
explicitely 'programmed' using suitable operators in selection
and computations in output.
b) time-slicing can also be expressed using interval derivation in
output and containment operator in selection.
c) there is no easy way of specifying coalescing without complicated
grouping and computations. The situation is similar to some extent
with 'distict' in SQL, where duplicates are not aotomatically
eliminated. We expect that coalescing may also not be done
automatically in temporal SQLs.
d) queries on current and past data are straightforward to
express in Jensen taxonomy also, but are mentioned explicitely
here as a class because of their importance (likely to be
more frequent).
e) the word 'relationship' in the above class definition format
represents various possible time domain operators (including
the duration, ordering and containment based operators
defined in Jensen taxonomy).
f) the queries based on 'non-existance' are emphasized as they
are easy to state in English, but difficult to formulate in
SQL (usually need nested queries).
Comments/questions are welcome.
Nandlal Sarda